The Situation:

They wanted you to provide:

Market analyse

First and foremost, I will make a quickly analyse of brazilian market. This analyse will help me to choose which direction will be better to follow, as I have a short time to try as many way possible. In the firs moment we just have done a data mining to preprer the data for forecast analyse.
Remark about how I create group of Avarage Pack Size, I used a boxplot to set the size of each group.

The first graphic bellow ,volume and value normalized, helps us to undestand the moviment of market. In 2016, the volume market sharply drecrease more than the value market, even drecrease less than before as you can note the blue line slope. One of my issue after June 2016 is the price may impact the volume, the red line is almost flat.

For a confirm conclusion about the price, we can note in the 2 chart beloow the size of point which mean the price, did not change pasting time.

A quick undestand which Flavor has more impact in the market, easly notice in the graphic bellow the Flavor Milk Chocolate has a huge impact in the market result. So could be interst do different analyses, a cluster of Milk Chocolate and a cluster of another Flavors. Spliting in two cluster and zoom the second one, we can see cherry flavor follow by coffe flavor are most predominant.

Now, you can check witouth be normalized. I had to use 3 different graph because the range of values is large, so if try to plot in a single chart, woulb clear the information.

Analysing the material market, as easy to realise the plastic predominant, the clomun chart on the left is the value market and on the right volume market by material package. In the end of 2017 seems exist a inverse correlation or just a casuality because the campaing agains plastics are inrease nowadays.

The caloric content always have been dominated by sugar, the graphic bellow can prove this sentence.

The market has two type of size package that lead the other.

After understand the market, we can set the categories more influencer for run the forecast model, is usual help in the computer performance because will work with less data.

Another way to understand the table is plotting in a line chart by year. The drecreasing moviment is not visible just in the chart along months, but is visible if we look at y axis. The same information that is in the table above, is confirm in the charts.

There are 2 facts more that could have strong influence in the market, Coverage and Shelf Life. Was calculated the mean by Year and Month, as you can see, the charts the coverage is decreasing since January 2015 so may be one of factor which has strong influence in the Market. The shelf life, in overall, keep in the same range and is clear to notice the sazonality in the end of the year. The histogram and density chart help us to detect any outlier which could mess up the mean result.

For help us to identify the strong correlation in the variables, the two chart bellow will clarify all issue which could appear. We can conclued

Forcast scene

Now, we are able to choose the best atributes for work with forecast with less data and the result will be able to describe the market as well. Follow the atributes:

We can not assume a cluster with these atributes together because was made analyse independet. So if the row has one of these tree atributes will be acptable to work. The cluster that has been create represent 97% of the whole volume and 96% of the whole value.

The forecast model that I used was Prophet developed by Facebbok. I will train the data with Prophet model and analyse the error if is acceptable to move on.

Chocolate Milk

The last month result in 2017, the brand Lili drop by 64,8% share volume and Gen up by 25,4% share volume, if you analyse the Price Index will be possible to realise the price has a strong influence because the brand Lili rose the price by 40% over the market price and the Glen brand dramatically dropped by 23% of market price.

The Lily was chosen to apply a forecast, even the last share volume had a suddenly drop this won’t reflect a long term.

I adopt the RSME and MAPE for valueate the model result. The dataset is not big enough to give a confident accuracy, I would need a compute more powerfull for consider a large dataset.

## Initial log joint probability = -5.92468
## Optimization terminated normally: 
##   Convergence detected: absolute parameter change was below tolerance

Cherry

The Cherry flavor has a scene more equal of share volume by 2015, the next year Harley brand has big variation a long the year that need to do a deep analyse for understand those movement whcih is unsual.

I will choose the Harley brand to apply the forecast model for 2018.

## Initial log joint probability = -4.56781
## Optimization terminated normally: 
##   Convergence detected: absolute parameter change was below tolerance

Forward Studies